A R package of MAVE QC
A simple example for QC
What’s Screen QC?
It may require to load dependecies, please refer to the installation in README.
library(MAVEQC)
This creates a list of objects which have all the input datasets.
sge_objs <- import_sge_files("/path/to/input/directory", "sample_sheet.tsv")
## Importing files for samples:
## |--> hgsm3_d0_r1
## |--> hgsm3_d4_r1
## |--> hgsm3_d7_r1
## |--> hgsm3_d15_r1
## |--> hgsm3_d4_r2
## |--> hgsm3_d7_r2
## |--> hgsm3_d15_r2
## |--> hgsm3_d4_r3
## |--> hgsm3_d7_r3
## |--> hgsm3_d15_r3
sge_objs[[1]]
## An object of class SGE
## |--> sample name: hgsm3_d0_r1
## |--> library type: screen
## |--> library name: SMARCA4_exon26
## |--> 5' adaptor: CTGACTGGCACCTCTTCCCCCAGGA
## |--> 3' adaptor: CCCCGACCCCTCCCCAGCGTGAATG
## |--> ref seq: CCGGTGCTGGGCTCACCTCATCCTGCTCCTCGTGCTCCAGGATGGCCTGCAGGAAGGCGCGCCGCTCATGGCTGGAGGACTTCTGGTCGAACATGCCGGCCTGGATCACCTTCTGGTCCACGTTGAGCTTGTACTTGGCTGCAGCTAGGATCTTCTCCTCCACGCTGTTGACGGTGCAGAGGCGGAGCACACGCACCTCGTTCTGCTGCCCGATGCGGTGGGCTCGGTCCTGCGCTTGCAGG
## |--> pam seq: CCGGTGCTGGGCTCACCTCATCCTGCTCCTCGTGCTCCAGGATGGCCTGCAGGAAGGCGCGCCGCTCATGGCTGGAGGACTTCTGGTCGAACATGCCGGCCTGGATCACCTTCTGGTCCACGTTGAGCTTATATTTAGCTGCAGCTAGGATCTTCTCCTCCACGCTGTTGACGGTGCAGAGGCGGAGCACACGCACCTCGTTCTGCTGCCCGATGCGGTGGGCTCGGTCCTGCGCTTGCAGG
## |--> No. of library-dependent counts: 3273
## |--> No. of library-independent counts: 370889
## |--> valiant meta: 3273 records and 24 fields
## |--> 3273 library-dependent count ids matched in valiant meta oligo names
It requires an QC object to run the process.
create_sampleqc_object can create the QC object using a
list of objects.
Sample QC here needs the reference samples for the guidance, which can by provided by the vector of sample names or sample index in the sample sheet.
samqc <- create_sampleqc_object(sge_objs)
samqc@samples_ref <- select_objects(sge_objs, c(2,5,8))
samqc <- run_sample_qc(samqc, "screen")
## Filtering by the total number of reads...
## Filtering by low counts...
## |--> Creating k-means clusters...
## |--> Filtering using clusters...
## |--> Filtering on hgsm3_d0_r1
## |--> Filtering on hgsm3_d4_r1
## |--> Filtering on hgsm3_d7_r1
## |--> Filtering on hgsm3_d15_r1
## |--> Filtering on hgsm3_d4_r2
## |--> Filtering on hgsm3_d7_r2
## |--> Filtering on hgsm3_d15_r2
## |--> Filtering on hgsm3_d4_r3
## |--> Filtering on hgsm3_d7_r3
## |--> Filtering on hgsm3_d15_r3
## Filtering by depth and percentage in samples...
## Filtering by library mapping...
## Filtering by library coverage...
## Sorting library counts by position...
## |--> Sorting on hgsm3_d0_r1
## |--> Sorting on hgsm3_d4_r1
## |--> Sorting on hgsm3_d7_r1
## |--> Sorting on hgsm3_d15_r1
## |--> Sorting on hgsm3_d4_r2
## |--> Sorting on hgsm3_d7_r2
## |--> Sorting on hgsm3_d15_r2
## |--> Sorting on hgsm3_d4_r3
## |--> Sorting on hgsm3_d7_r3
## |--> Sorting on hgsm3_d15_r3
## Calculating gini coefficiency...
## Mapping consequencing annotation...
samqc
## An object of class sampleQC
## |--> samples:
## |--> hgsm3_d0_r1
## |--> hgsm3_d4_r1
## |--> hgsm3_d7_r1
## |--> hgsm3_d15_r1
## |--> hgsm3_d4_r2
## |--> hgsm3_d7_r2
## |--> hgsm3_d15_r2
## |--> hgsm3_d4_r3
## |--> hgsm3_d7_r3
## |--> hgsm3_d15_r3
## |--> reference samples:
## |--> hgsm3_d4_r1
## |--> hgsm3_d4_r2
## |--> hgsm3_d4_r3
## |--> QC results:
## |--> hgsm3_d0_r1: TRUE
## |--> hgsm3_d4_r1: TRUE
## |--> hgsm3_d7_r1: TRUE
## |--> hgsm3_d15_r1: TRUE
## |--> hgsm3_d4_r2: TRUE
## |--> hgsm3_d7_r2: TRUE
## |--> hgsm3_d15_r2: TRUE
## |--> hgsm3_d4_r3: TRUE
## |--> hgsm3_d7_r3: TRUE
## |--> hgsm3_d15_r3: TRUE
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
## |--> NA: NA
The output directory is required for plotting using the QC object.
qcplot_readlens(samqc, plotdir = outputdir)
qcout_sampleqc_length(samqc)
qcplot_stats_total(samqc, plotdir = outputdir)
qcout_sampleqc_total(samqc)
qcplot_stats_accepted(samqc, plotdir = outputdir)
qcout_sampleqc_library(samqc)
qcout_sampleqc_cov(samqc)
qcplot_position(samqc, "screen", plotdir = outputdir)
qcout_sampleqc_pos_cov(samqc)
qcplot_position_anno(samqc, c("hgsm3_d4_r1", "hgsm3_d4_r2", "hgsm3_d4_r3"), type = "lof", plotdir = outputdir)
qcout_sampleqc_pos_per(samqc)
coldata is necessary for DESeq2. Example like below:
| replicate | condition | |
|---|---|---|
| hgsm3_d4_r1 | R1 | D4 |
| hgsm3_d7_r1 | R1 | D7 |
| hgsm3_d15_r1 | R1 | D15 |
| hgsm3_d4_r2 | R2 | D4 |
| hgsm3_d7_r2 | R2 | D7 |
| hgsm3_d15_r2 | R2 | D15 |
| hgsm3_d4_r3 | R3 | D4 |
| hgsm3_d7_r3 | R3 | D7 |
| hgsm3_d15_r3 | R3 | D15 |
run_sample_qc_deseq2 runs DESeq2 analysis by conditions
in the coldata.
expqc <- create_experimentqc_object(samqc, coldata, "D4")
expqc <- run_experiment_qc(expqc)
## Running control deseq2 to get size factor...
## Running deseq2 on all the filtered samples...
The output directory is required for plotting using the QC object.
qcplot_dist_samples(expqc, plotdir = outputdir)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
qcplot_pca_samples(expqc, ntop = 500, plotdir = outputdir)
qcplot_deseq_fc(expqc, plotdir = outputdir)